Context-informed Knowledge Extraction from Document Collections to Support User Navigation

نویسندگان

  • Mario Cataldi
  • Claudio Schifanella
  • K. Selçuk Candan
چکیده

Most of the existing document and web search engines rely on keyword-based queries. To find matches, these queries are processed using retrieval algorithms that rely on word frequencies, topic recentness, document authority, and (in some cases) available ontologies. In this paper, we propose an innovative approach to exploring text collections using a novel keywords-by-concepts (KbC) graph, which supports navigation using domain-specific concepts as well as keywords that are characterizing the text corpus. The KbC graph is a weighted graph, created by tightly integrating keywords extracted from documents and concepts obtained from domain taxonomies. Documents in the corpus are associated to the nodes of the graph based on evidence supporting contextual relevance; thus, the KbC graph supports contextually informed access to these documents. The construction of the KbC graph relies on a spreading-activation like technique which mimics the way the brain links and constructs knowledge. In this paper, we also present CoSeNa (Context-based Search and Navigation) system that leverages the KbC model as the basis for document exploration as well as contextuallyinformed media integration.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Collaborative Knowledge Organization System

This chapter discusses folksonomies as a novel way of indexing documents and locating information based on user generated keywords. Folksonomies are considered from the point of view of knowledge organization and representation in the context of user collaboration within the Web 2.0 environments. Folksonomies provide multiple benefits which make them a useful indexing method in various contexts...

متن کامل

Exploratory Navigation in Large Multimedia Documents Using Context Lenses

The Context Lens (CL) is a focus+context visualization and navigation tool particularly suited for navigating large documents, or collections of documents. Context Lenses have been applied successfully to navigating Web pages, video collections and slide presentations. In this paper we discuss our experiences both with linear as well as with hierarchical Context Lenses. We focus on the use of i...

متن کامل

Concepts across the Interspace: Information Infrastructure for Community Knowledge

A global information infrastructure for knowledge manipulation must support effective analysis to correlate related objects. The Interspace is the coming global network, where knowledge manipulation is supported by concept navigation across community spaces. We have produced a working Interspace Prototype, an analysis environment supporting semantic indexing on community repositories. Scalable ...

متن کامل

User-centric Knowledge Extraction and Quality Assurance

An ontology is a machine readable knowledge collection. There is an abundance of information available for human consumption. Thus, large general knowledge ontologies are typically generated tapping into this information source using imperfect automatic extraction approaches that translate human readable text into machine readable semantic knowledge. This thesis provides methods for user-driven...

متن کامل

Metadata for Multidimensional Categorization and Navigation Support on Multimedia Documents

An increasing technological effort is spent on integrated representations of document collections and metadata. For instance the emerging XML standard offers opportunities to represent metadata in for, e.g., improving query and navigation support within web-based document collections. Despite this development, most applications of catalogue metaphors on the web ranging from small web site catal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JMPT

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2010